{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Bayesian Interpretation of Regularization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## (a)\n",
    "\n",
    "We have \n",
    "$$\\begin{align*} \n",
    "p(\\theta\\mid x,y) &= \\frac{p(\\theta, y \\mid x)}{p(y\\mid x)} \\\\\n",
    "&= \\frac{p( y \\mid x, \\theta) p(\\theta \\mid x)}{p(y\\mid x)} \\\\\n",
    "&= \\frac{p( y \\mid x, \\theta) p(\\theta)}{p(y\\mid x)}.\n",
    "\\end{align*}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Because the denominator does not depend on $\\theta$, we conclude\n",
    "$$ \\theta_\\text{MAP}=\\arg\\max_\\theta p(\\theta\\mid x,y) = \\arg\\max_\\theta p(y\\mid x,\\theta)p(\\theta).$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## (b)\n",
    "\n",
    "If $\\theta \\sim \\mathcal N(0,\\eta^2I)$, then\n",
    "$$ p(\\theta) = \\frac 1{(2\\pi)^{n/2} \\eta^n} \\exp\\left(-\\frac 1 2\\theta^T\\eta^{-2}\\theta\\right) =\\frac 1{(2\\pi)^{n/2} \\eta^n} \\exp\\left(-\\frac 1 2\\eta^{-2} \\|\\theta\\|_2^2\\right) $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Therefore by (a) we have  \n",
    "$$\\begin{align*}\n",
    "\\theta_\\text{MAP} &= \\arg\\max_\\theta \\log p(y\\mid x,\\theta) + \\log p(\\theta) \\\\\n",
    "&= \\arg\\min_\\theta -\\log p(y\\mid x,\\theta) +\\frac 1 2 \\eta^{-2} \\|\\theta\\|_2^2.\n",
    "\\end{align*}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Therefore $\\lambda = \\frac 1 2\\eta^{-2}$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## (c)\n",
    "\n",
    "With $\\vec y = X\\theta +\\vec \\epsilon$ we get\n",
    "$$ p(\\vec y\\mid X,\\theta) \\propto \\exp(-\\frac 1 2\\sigma^{-2} \\|\\vec y -X\\theta\\|_2^2),$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "so that \n",
    "$$ \\begin{align*}\n",
    "\\theta_\\text{MAP} &= \\arg\\min_\\theta \\sigma^{-2} \\|\\vec y -X\\theta\\|_2^2+ \\eta^{-2} \\|\\theta\\|_2^2 \\\\\n",
    "&= \\arg\\min_\\theta J(\\theta),\n",
    "\\end{align*}$$\n",
    "where \n",
    "$$ J(\\theta):= \\sigma^{-2} \\left(X\\theta-\\vec y\\right)^T\\left(X\\theta -\\vec y\\right)+ \\eta^{-2} \\theta^T\\theta.$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$J(\\theta)$ is a convex (quadratic) function of $\\theta$. This means we can find $\\theta_\\text{MAP}$ as the unique solution $\\theta$ of $\\nabla_\\theta J(\\theta)=0$, where the gradient is"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$ \\nabla_\\theta J(\\theta) = 2\\sigma^{-2}X^T\\left(X\\theta-\\vec y\\right)+ 2\\eta^{-2}\\theta.$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This means that \n",
    "$$ \\theta_\\text{MAP} = \\left(X^TX + \\frac{\\sigma^2}{\\eta^2}I\\right)^{-1}X^T\\vec y.$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## (d)\n",
    "With the stated assumptions the prior distribution of $\\theta$ is given by\n",
    "$$\\begin{align*}\n",
    "p(\\theta) &=\\prod_{i=1}^n p(\\theta_i) \\\\\n",
    "&= \\prod_{i=1}^n \\frac 1{2b}\\exp \\left(-\\frac 1b |\\theta_i|   \\right)\\\\\n",
    "&\\propto \\exp \\left(-\\frac 1b  \\|\\theta\\|_1 \\right).\n",
    "\\end{align*}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Reusing the calculation of $p(\\vec y\\mid X,\\theta)$ from part (c) we get\n",
    "$$ \\begin{align*}\n",
    "\\theta_\\text{MAP} &= \\arg\\min_\\theta \\frac 1 2\\sigma^{-2} \\|\\vec y -X\\theta\\|_2^2+ \\frac 1b \\|\\theta\\|_1 \\\\\n",
    "&= \\arg\\min_\\theta J(\\theta),\n",
    "\\end{align*}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "where \n",
    "$$ J(\\theta):= \\|\\vec y -X\\theta\\|_2^2+ \\frac {2\\sigma^2}b \\|\\theta\\|_1 ,$$\n",
    "so $\\gamma = \\frac {2\\sigma^2}b $."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
